Back

Molecular Systems Biology

Springer Science and Business Media LLC

Preprints posted in the last 90 days, ranked by how well they match Molecular Systems Biology's content profile, based on 142 papers previously published here. The average preprint has a 0.06% match score for this journal, so anything above that is already an above-average fit.

1
Automated mini-bioreactors reveal the temporal dynamics and multi-omics responses of CRISPRi knockdowns in Pseudomonas putida

Saavedra, M. A.; Grassi, S.; Jespersen, M. G.; Rocha, C.; Kandasamy, V.; Nikel, P. I.; Nielsen, L. K.; Donati, S.

2026-03-06 synthetic biology 10.64898/2026.03.06.709552 medRxiv
Top 0.1%
12.2%
Show abstract

Characterizing CRISPR interference (CRISPRi) phenotypes presents a fundamental temporal challenge: pre-existing overabundance of target proteins can mask early silencing, requiring extended growth for dilution, yet prolonged repression rapidly selects for escaper mutants. To resolve this, we integrated a tightly regulated CRISPRi system in Pseudomonas putida with an automated mini bioreactor platform operating in turbidostat mode. By maintaining continuous exponential growth, we mapped the exact temporal dynamics of essential gene silencing. We identified a critical observation window between 17 and 27 hours (7-9.5 cell doublings) where repression exerts its maximum physiological impact, directly preceding population takeover by target-site mutated escapers. Applying this workflow to the arginine biosynthesis pathway, multi-omics profiling disentangled transient physiological buffering from long-term mutational events, revealing that argH and argG knockdowns trigger highly diverse metabolomic perturbations. This scalable framework overcomes batch culture limitations, ensuring precise temporal control for accurate phenotypic characterization and reliable functional genomics. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/709552v1_ufig1.gif" ALT="Figure 1"> View larger version (19K): org.highwire.dtl.DTLVardef@c3dd07org.highwire.dtl.DTLVardef@e42939org.highwire.dtl.DTLVardef@14e7228org.highwire.dtl.DTLVardef@128ae19_HPS_FORMAT_FIGEXP M_FIG C_FIG

2
Structured Schemas for LLM-Modeler Collaboration in Quantitative Systems Pharmacology Model Calibration

Eliason, J.; Popel, A. S.

2026-03-09 systems biology 10.64898/2026.03.05.709623 medRxiv
Top 0.1%
9.9%
Show abstract

Quantitative systems pharmacology (QSP) models require calibration data from published literature, yet manual curation produces inconsistent documentation while large language model (LLM) extraction exhibits hallucination and fabrication errors unacceptable for quantitative modeling. We present MAPLE (Model-Aware Parameterization from Literature Evidence), a framework that uses structured validation schemas as a collaboration interface between LLMs and modelers. Two complementary schemas capture calibration data at different scales: one for isolated experiments that constrain individual parameters through simplified forward models, and one for clinical and in vivo endpoints that constrain the full model through species-level observables. Both schemas separate data extraction from modeling decisions, capturing literature values with full provenance in a machine-verifiable form. Targeted validators catch characteristic LLM errors: value-in-snippet matching detects hallucinated values, DOI resolution flags fabricated citations, and code execution catches malformed forward models. We evaluate MAPLE on 87 calibration targets for a pancreatic ductal adenocarcinoma (PDAC) QSP model, using two collaboration modes: batch LLM extraction followed by interactive curation, and interactive extraction where modeler and LLM collaborate in real time. Both modes required substantial modeler input: the modeler changed forward model types in 65% of SubmodelTargets, adjusted prior parameters in 46%, and revised source relevance assessments in all files. Interactively extracted targets embedded modeler effort in the extraction process, producing near-final output. The schemas ensure completeness and enable reproducible, provenance-rich calibration regardless of workflow.

3
Disagreement among variant effect predictors guides experimental prioritization of target proteins

Jonsson, N. F.; Marsh, J. A.; Lindorff-Larsen, K.

2026-03-20 bioinformatics 10.64898/2026.03.18.712765 medRxiv
Top 0.1%
9.1%
Show abstract

Interpreting the functional consequences of genetic variation, especially rare missense variants, remains a significant challenge in human genetics. Computational variant effect predictors (VEPs) and multiplexed assays of variant effects (MAVEs) provide complementary approaches, with VEPs offering scalable predictions and MAVEs delivering detailed empirical measurements. However, MAVEs are resource intensive and cannot yet be applied broadly across the proteome, making it important to identify proteins where experimental mapping will be most informative. We hypothesised that MAVEs should be particularly valuable for proteins where computational predictors disagree, as such disagreement may highlight mechanistic blind spots. To test this, we analysed predictions from ten distinct VEPs across more than 13,000 human proteins and quantified inter-predictor concordance. We observed substantial variability across proteins in the degree of agreement across predictors and investigated structural, functional and gene-level features associated with this variation. We find that inter-VEP concordance showed no relationship with agreement to experimental MAVE data. If predictor agreement reflected how intrinsically predictable a protein is, these quantities would be expected to correlate. Their decoupling instead suggests that MAVEs may provide orthogonal information to VEPs, supporting the use of inter-VEP disagreement to prioritise proteins where experimental data will be most informative. We therefore propose using inter-VEP disagreement as a practical strategy to prioritise proteins for experimental characterization. Focusing on proteins with low predictor concordance should maximise the informational value of new MAVEs, and improve variant interpretation in both research and clinical contexts.

4
A Manual of Procedures for the Generation of the AI-Ready and Exploratory Atlas for Diabetes Insights (AI-READI) Database.

Matthies, D. S.; Edberg, J. C.; Baxter, S. L.; Lee, A. Y.; Lee, C. S.; McGwin, G.; Owen, J. P.; Zangwill, L. M.; Owsley, C.; AI-READI Consortium,

2026-04-04 endocrinology 10.64898/2026.03.30.26349552 medRxiv
Top 0.1%
9.1%
Show abstract

The ability to understand and affect the course of complex, multi-system diseases like diabetes has been limited by a lack of well-designed, high-quality and large multimodal datasets. The NIH Bridge2AI AI-READI project (aireadi.org) aims to address this shortfall by generating an AI-ready dataset to support AI discoveries in type 2 diabetes mellitus (T2DM). This manual of procedures provides a detailed description of the AI-READI protocol.

5
Robotic perturbation proteomics and AI agents enable scalable drug mechanism discovery

Jiang, Y.; Movassaghi, C. S.; Munoz-Estrada, J.; Sundararaman, N.; Momenzadeh, A.; Meyer, J. G.

2026-05-07 systems biology 10.64898/2026.05.04.722718 medRxiv
Top 0.1%
8.5%
Show abstract

Large-scale mass spectrometry-based proteomic screening could reveal cellular mechanisms of drug action at systems resolution but remains limited by experimental complexity and the difficulty of extracting insight from high-dimensional datasets. Here, we describe an end-to-end platform that combines semi-automated sample preparation, rapid LC-MS/MS, and AI agent-based data analysis to enable scalable proteomic screening. In a screen of 172 compounds in HepG2 cells, we generated 1,232 proteomes with more than 8,700 quantified proteins in approximately three weeks. Agentic AI reduced data analysis and interpretation time to less than one day while translating proteomic measurements into structured mechanism-oriented summaries and experimentally testable hypotheses. Guided by this framework, we validated: (1) a cholesterol-lowering effect of methylene blue in vitro and (2) an association between loratadine exposure and increased circulating iron in matched electronic health record analyses. This work establishes a scalable platform for generating proteomic drug perturbation data and automatically converting that data into mechanistic insights and candidate translational hypotheses using AI.

6
Protein Stability, Turnover Kinetics, and Abundance Constrain the Scaling of Protein Interaction Networks

Goel, M.; Nissley, D. A.; Castellanos-Girouard, X.; Kuntz, C. P.; Wang, Y.; Mukhtar, M. S.; Serohijos, A.; Schlebach, J. P.

2026-05-14 systems biology 10.64898/2026.05.11.724303 medRxiv
Top 0.1%
8.2%
Show abstract

The propensity of proteins to form oligomers is ultimately dictated by their structural configuration(s). Proteins that persist in a discrete conformational state may form a limited number of specific interactions while those that sample a broader structural ensemble may instead associate with a wider array of partners. These intrinsic tendencies potentially constrain the way proteins navigate wider interaction networks. In this work, we aggregated and surveyed a wide variety of biophysical, biochemical, and cellular descriptors of the S. cerevisiae proteome to identify biases in the connectivity of its protein-protein interaction network. Using mass spectrometry-based interactome measurements and various protein stability estimates, we find that a disproportionate number of abundant, yet unstable binding proteins act as network hubs. Moreover, we show that these features alone can be used to discriminate between hubs and non-hub proteins with high accuracy (AUROC = 0.898). Interestingly, we find that half-lives of hub proteins depend on whether or not they reside within static complexes and/ or whether they interact with molecular chaperones. Finally, we note that the observed connectivity biases associated with abundant, unstable proteins only pertain to network hubs, but not to the bottlenecks that connect them. Together, our findings reveal how the conformational stability of a protein may constrain its context within protein-protein interaction networks.

7
Quantification of domain-specific intrinsic capacity using mortality data

Fuentealba, M.; Zhai, T.; Aldajani, S.; Gladyshev, V. N.; Snyder, M.; Furman, D.

2026-04-08 systems biology 10.64898/2026.04.06.714260 medRxiv
Top 0.1%
8.1%
Show abstract

Functional health is centered on five domains of Intrinsic Capacity (IC): locomotion, cognition, vitality, psychological and sensory capacity. Therefore, measuring IC at the domain-specific level is the cornerstone for developing preventive interventions to help individuals preserve their independence. In this study, we used 63 clinical features from the UK Biobank to develop IC age, an 18-year mortality risk estimator that approximates an individuals biological age associated with the decline of each IC domain. By establishing proteomic surrogates of IC age, we find immune system activation across domains and provide a proteomic framework that may facilitate scalable monitoring of functional health decline.

8
Shared multicellular injury programs of acute and chronic kidney disease enable mechanistic patient stratification

Fallegger, R.; Gomez-Ochoa, S. A.; Boys, C.; Ramirez Flores, R. O.; Tanevski, J.; Pashos, E.; Feliers, D.; Piper, M.; Schaub, J. A.; Zhou, Z.; Mao, W.; Chen, X.; Sealfon, R. S. G.; Menon, R.; Nair, V.; Eddy, S.; Alakwaa, F. M.; Pyle, L.; Choi, Y. J.; Bjornstad, P.; Alpers, C. E.; Bitzer, M.; Bomback, A. S.; Caramori, M. L.; Demeke, D.; Fogo, A. B.; Herlitz, L. C.; Kiryluk, K.; Lash, J. P.; Murugan, R.; O'Toole, J. F.; Palevsky, P. M.; Parikh, C. R.; Rosas, S. E.; Rosenberg, A. Z.; Sedor, J. R.; Vazquez, M. A.; Waikar, S. S.; Wilson, F. P.; Hodgin, J. B.; Barisoni, L.; Himmelfarb, J.; Jain, S.;

2026-03-06 nephrology 10.64898/2026.03.05.26347522 medRxiv
Top 0.1%
8.1%
Show abstract

AbstractAcute kidney injury (AKI) and chronic kidney disease (CKD) are two interconnected clinical conditions, both defined by degree of functional impairment, but with heterogeneous clinical trajectories. Using new transcriptomic technologies, recent studies have described the cellular diversity in the healthy and injured kidney at the single cell level. Here, we used single nucleus transcriptomics to investigate the molecular diversity and commonalities in kidney biopsies from over 150 participants with AKI and CKD enrolled within the Kidney Precision Medicine Project (KPMP) and did so at the patient participant level. Using an unsupervised approach, we identified two multi-cellular programs associated with clinical and histopathological features of acute injury and chronic damage, respectively. We found that these programs are expressed across patients with AKI and CKD, supporting shared, rather than distinct, underlying molecular mechanisms. These programs capture tissue-level compositional changes towards adaptive and failed-repair states in tubular epithelial cells, as well as intra-cellular molecular changes characteristic of stress in all cell types. We identified subunits of the NFkB and AP-1 complexes, as well as members of the STAT family, as putative upstream regulators of the acute and chronic programs. We were able to map these continuous molecular measures of acute injury and chronic damage to urine and plasma protein profiles obtained at time of biopsy. These non-invasive protein signatures were predictive of renal outcomes in an independent cohort of 44 thousand participants from the UK biobank. In summary, unbiased identification of cellular programs in kidney disease biopsies defined molecular programs of injury cutting across conventional disease categorization and established a non-invasive molecular link to long term patient outcomes. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=125 SRC="FIGDIR/small/26347522v1_ufig1.gif" ALT="Figure 1"> View larger version (38K): org.highwire.dtl.DTLVardef@a2bf0forg.highwire.dtl.DTLVardef@ad93f6org.highwire.dtl.DTLVardef@1cd21c7org.highwire.dtl.DTLVardef@64b5ab_HPS_FORMAT_FIGEXP M_FIG C_FIG

9
A trajectory-coupled network bottleneck governs gemcitabine resistance in 3D PDAC tissue models

Balkenhol, J.; Almasi, M.; Nieves Pereira, J. G.; Dandekar, T.; Dandekar, G.

2026-03-26 systems biology 10.64898/2026.03.24.713885 medRxiv
Top 0.1%
7.0%
Show abstract

PDAC exhibits rapid chemoresistance, yet how drug-tolerant states arise remains unclear. Existing approaches miss how network topology evolves across cell-state transitions under drug pressure. A 3D PANC-1 tissue model on decellularized intestinal matrix was used for scRNA-seq across four conditions (control, GEM, TGF-{beta}1, GEM+TGF-{beta}1). Pseudotime trajectory inference was combined with dynamic PPI network analysis. Findings were cross-examined in a PDAC atlas (726,107 cells, 231 patients; Loveless et al., 2025). GEM resistance involved E2F1, mTOR, CDK1, AURKA, TPX2, TOP2A, and BIRC5. TGF-{beta}1 drove EMT resistance via KRAS, glycolysis, and hypoxia, inducing SPOCK1, MBOAT2, COL5A1, ADAMTS6, THBS1, and FN1. Trajectory-coupled network analysis revealed an emergent bottleneck when G1[->]S and TGF-{beta}1-induced EMT co-occurred: CDK1 centrality spiked selectively, with CDKN1A as critical regulator. This CDK1-CDKN1A-WEE1 axis defines an "S-phase persistence" state enriched for GEM survivors. Atlas cross-examination confirmed 8.7-fold metastatic enrichment of triple-positive cells and EMT-cell-cycle coupling. Trajectory-coupled network topology analysis identifies CDK1-CDKN1A-WEE1 as a chemoresistance bottleneck corroborated in 726,107 patient cells. The framework generalizes to drug resistance across cancer types.

10
Multi-omic signatures of genetic mechanisms inform on type 2 diabetes biology and patient heterogeneity

Sevilla-Gonzalez, M.; Martinez-Munoz, A. M.; Hanson, P. A.; Hsu, S.; Wang, X.; Smith, K.; Chen, Z.-Z.; Szczerbinski, L.; Kaur, V.; Taylor, K. D.; Wood, A. C.; Mi, M. Y.; Li, H.; Wittenbecher, C.; Gerszten, R. E.; Rich, S.; Rotter, J.; Li, J.; Mercader, J. M.; Manning, A. K.; Shah, R. V. K.; Udler, M.

2026-04-25 endocrinology 10.64898/2026.04.17.26351136 medRxiv
Top 0.1%
6.9%
Show abstract

Type 2 diabetes (T2D) is a heterogeneous disease shaped by genetic pathways related to insulin resistance and {beta}-cell dysfunction, but how this heterogeneity is reflected molecularly remains unclear. We integrated partitioned polygenic scores (pPS) with proteomic and metabolomic profiling to define molecular signatures of T2D and their clinical relevance. We analyzed UK Biobank participants with genomic, proteomic, and metabolomic data. In a disease-free training subset, we used LASSO regression to identify multi-omic signatures associated with each pPS by jointly modeling proteins and metabolites. In an independent testing set, we constructed multi-omic scores and examined their associations with clinical traits and diabetes-related outcomes. Mediation analyses were used to investigate putative causal pathways. Key findings were evaluated in the Multi-Ethnic Study of Atherosclerosis (MESA). We identified distinct multi-omic signatures that capture the molecular architecture of T2D genetic risk across physiological subtypes. Compared with genetic scores alone, multi-omic pPS showed larger effect sizes and better disease discrimination. These scores recapitulated subtype-specific physiology and were associated with T2D risk. The Beta-Cell 2 multi-omic score showed marked stratification for insulin use, which was replicated in MESA, where it also predicted future insulin use. Mediation analyses implicated lipoprotein remodeling and fatty acid metabolism in the Lipodystrophy 1 cluster, accounting for 30-45% of the total effect of pPS on T2D risk. Integrating process-specific genetic risk with circulating multi-omic profiles reveals biologically distinct endotypes of T2D and supports a framework for improved patient stratification and risk assessment.

11
Acetate promotes nutritional adaptation in Escherichia coli

Devlin, L.; Oudard, V.; Barthe, M.; Gosselin-Monplaisir, T.; Dupin, J.-B.; Condamine, F.; Baudry, J.; Cocaign-Bousquet, M.; Millard, P.; Enjalbert, B.

2026-05-08 systems biology 10.64898/2026.05.05.722864 medRxiv
Top 0.1%
6.2%
Show abstract

The long-held view that acetate, one of the main fermentation by-products of Escherichia coli, is toxic to microbial growth is currently challenged. Here, we demonstrate that acetate promotes E. coli adaptation to nutrient changes by accelerating growth resumption, with as little as 250 {micro}M acetate being sufficient to shorten the lag phase by several hours. Acetate was found to be consumed via acetyl-CoA synthetase very early after the nutrient change. Transcriptomics, metabolomics and 13C-isotope labeling experiments show that acetate replenishes metabolic pools in the tricarboxylic acid cycle and upper glycolysis. Single-cell analyses reveal that acetate increases the adaptation speed of individual cells switching to the new nutrient. We conclude that the reuse of excreted acetate by E. coli facilitates metabolic adaptation by transiently replenishing central metabolite pools. This work identifies an unexpected role of acetate in the nutritional adaptation of E. coli, providing new insights into the physiological relevance of overflow metabolism. HighlightsO_LIAcetate facilitates E. coli adaptation from one nutrient to another. C_LIO_LILess than 250 {micro}M acetate is sufficient to halve lag times. C_LIO_LIAcetate helps replenish metabolite pools in central carbon metabolism. C_LIO_LIAcetate excretion is an adaptative strategy to overcome resource fluctuations. C_LI

12
Biotic-response networks are an important organizer of the transcriptome in wild Arabidopsis thaliana populations

Leite Montalvao, A. P.; Murray, K. D.; Bezrukov, I.; Betz, N.; Henry, L.; Duran, P.; Boppert, P.; Kolb, M.; TEAM PATHOCOM, ; Roux, F.; Bergelson, J.; Yuan, W.; Weigel, D.

2026-03-13 genomics 10.64898/2026.03.11.711176 medRxiv
Top 0.1%
6.2%
Show abstract

Extensive laboratory experimentation has revealed conserved molecular pathways controlling growth and stress responses in plants, yet how these programs operate in natural settings remains poorly understood. We investigated transcriptome organization in wild populations of Arabidopsis thaliana by sampling plants from 60 natural sites in Europe and North America across two seasons. Transcriptomes varied extensively among individuals and showed largely continuous rather than discrete structure across geography and season. Although disease and microbial colonization were common in the wild, wild transcriptomes did not simply recapitulate canonical laboratory stress signatures. Measured microbial infection, environmental, and phenotypic variables explained only a modest fraction of total expression variation, but infection-associated signals accounted for the largest share of the explainable component. Consistent with this, biotic-response networks defined in controlled laboratory experiments were well conserved in wild transcriptomes, whereas control and abiotic-response networks were substantially reorganized. Together, these results suggest that while core transcriptional modules remain recognizable across environments, regulatory relationships among modules differ markedly between laboratory and natural contexts. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=129 SRC="FIGDIR/small/711176v1_ufig1.gif" ALT="Figure 1"> View larger version (28K): org.highwire.dtl.DTLVardef@2c5356org.highwire.dtl.DTLVardef@136b9corg.highwire.dtl.DTLVardef@fddf37org.highwire.dtl.DTLVardef@149c28c_HPS_FORMAT_FIGEXP M_FIG C_FIG

13
A stress-function tradeoff organizes epithelial heterogeneity across spatial scales in the human thyroid

Korem Kohanim, Y.; Barkai, T.; Novoselsky, R.; Shir, S.; Bahar Halpern, K.; Reich-Zeliger, S.; Elkahal, J.; Tessler, I.; Shivatzki, S.; Schwartz, I.; Remer, E.; Avior, G.; Hoefllin, R.; Kedmi, M.; Keren-Shaul, H.; Goliand, I.; Addadi, Y.; Golani, O.; Alon, E.; Itzkovitz, S.; Medzhitov, R.

2026-03-16 systems biology 10.64898/2026.03.12.711294 medRxiv
Top 0.1%
6.2%
Show abstract

Many organs are organized into repeating anatomical units, yet how cellular heterogeneity is structured within and between these units remains poorly understood. Here we use spatial transcriptomics to dissect multiscale heterogeneity in the human thyroid gland, a tissue composed of hormone-producing follicles. Across human thyroid samples spanning non-inflamed to inflamed states, we develop a follicle-aware analytical framework that separates intra-follicular from inter-follicular variability. We find that heterogeneity among thyrocytes is not dominated by differences in hormone synthesis but instead by two opposing transcriptional programs: an active hormone-producing state and a damage-response thyrocyte (DRT) state enriched for stress, immune, and damage-response pathways. DRTs are spatially clustered, associated with DNA damage markers, and are enriched near immune niches. Notably, the balance between active and damage-response programs constitutes a major axis of variability across cells, follicles, and patients. Our findings highlight a damage-response epithelial thyrocyte state that may be fundamental to follicular function in the human thyroid and provide a general framework for studying heterogeneity in tissues composed of repeating anatomical units.

14
Linking Codon- and Protein-Level Mutation Scores to Population Genetics Reveals Heterogeneous Selection Efficiency Across Escherichia coli Lineages

Mischler, M.; Vigue, L.; Croce, G.; Weigt, M.; Tenaillon, O.

2026-03-18 genetics 10.64898/2026.03.16.711857 medRxiv
Top 0.1%
6.2%
Show abstract

Quantifying the selective effects of individual mutations is essential to understand how their population-wise frequencies evolve under natural selection and genetic drift. Large genomic datasets provide a real-life experiment that we exploit to characterize the efficiency of selection across different mutations types and populations. Using Direct Coupling Analysis, a model from statistical physics, we derive protein-informed scores for individual non-synonymous mutations identified in 81,440 Escherichia coli genomes. We show that these scores act as a latent variable capturing the probability that a mutation is beneficial, neutral, or mildly to highly deleterious. We contribute to the debate on the importance of synonymous mutations by demonstrating that their selection intensities span a single order of magnitude in the E. coli species, whereas non-synonymous mutations span six orders of magnitude. We further relate selection efficiency to genetic drift, defined as the inverse of population size, and to ecological lifestyle, and we identify a 10,000-fold reduction in selection efficiency between the entire E. coli species and its most pathogenic populations. Together, these results highlight how population genetics and protein variant fitness predictors inform one another: variation in selection efficiency is associated with shifts in the distribution of mutation scores, and population genetics data provide a benchmark to assess the accuracy of these scores. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=182 SRC="FIGDIR/small/711857v1_ufig1.gif" ALT="Figure 1"> View larger version (51K): org.highwire.dtl.DTLVardef@1df70corg.highwire.dtl.DTLVardef@1464860org.highwire.dtl.DTLVardef@139d4d3org.highwire.dtl.DTLVardef@1c3a4c5_HPS_FORMAT_FIGEXP M_FIG Schematic representation of the analysis of polymorphism in 81,440 Escherichia coli genomes. 458,443 polymorphic codon sites were identified and oriented using homologous sequences from closely related species. Mutations can be classified as synonymous or non-synonymous based on whether they alter the amino-acid sequence encoded, and real-valued scores predictive of fitness effects can be attributed to mutations within each of these classes. Codon scores reflect the global codon usage preference within the E. coli genome. DCA scores capture position- and amino-acid-specific preference as well as epistatic constraints and are obtained for each protein from a set of distantly related homologous sequences. Coupled with the abundance of polymorphic sites within different E. coli subpopulations, these different polymorphism classifications allow to precisely compare the intensity of selection between different types of mutations and across populations with distinct lifestyles, illustrated here by their pathogenic power. C_FIG

15
Paired wastewater and clinical genomics across metropolitan and hospital catchments reveals SARS-CoV-2 relevant mutations

Ruiz-Rodriguez, P.; Sanz-Carbonell, A.; Perez-Cataluna, A.; Cano-Jimenez, P.; Ruiz-Roldan, L.; Alandes, R.; Valiente-Mullor, C.; Gimeno, C.; Comas, I.; Sanchez, G.; Gonzalez-Candelas, F.; Coscolla, M.

2026-04-06 epidemiology 10.64898/2026.03.31.26346553 medRxiv
Top 0.1%
4.9%
Show abstract

Wastewater (WW) genomics can track SARS-CoV-2 circulation beyond clinical testing, but its ability to reflect clinical diversity and capture severity-linked mutations remains unclear. Here, we integrated 845 clinical genomes and 22 wastewater genomes from Valencia, Spain, across matched metropolitan and hospital catchments. We compared matched WW and clinical sequencing for lineage and mutation surveillance at two levels: metropolitan and hospital. Then, we tested WW sensitivity to detect mutations statistically associated with hospitalization status in regional (n = 4,843), national (n = 10,052) and supranational (n = 39,099) clinical datasets. WW surveillance captured the dominant Omicron background when collapsing lineages into parental lineages constellation but had limited sensitivity for fine-scale sublineage diversity. Performance was strongly catchment dependent: metropolitan wastewater best represented broader community circulation, whereas hospital wastewater was noisier but detected KP.3 months before its appearance in routine metropolitan clinical surveillance. Across clinical datasets, hospitalisation-associated substitutions showed limited reproducibility, although the national and supranational analyses converged on receptor-binding-domain substitutions D405N, K417N and R408S. Networks showed coupling between G252V in NTD with those RBD substitutions involved in immune escape and receptor engagement. Finally, integrating regional to supranational GWAS with interaction networks and wastewater detection prioritised mutations supported by at least two independent association layers, that includes mutations in the Spike, especially in RBD, and the wastewater-exclusive candidate S:V445P, which was missed by contemporaneous clinical sequencing. Overall, WW genomics preferentially recovers the common mutational backbone of SARS-CoV-2 circulation and can highlight important changes missed by clinical sampling, making it a complementary tool for real-time prioritisation of viral evolutionary change.We found partial overlap in lineage composition between WW and clinical samples, with higher overlap at the metropolitan (50%), than at the hospital level (30%). Conversely, we found a slightly higher overlap of individual mutations between WW and clinical samples at the hospital level (20%) than at the metropolitan area (16%), but shared mutations in both datasets were enriched in the Spike gene. Clade composition did not differ between 216 hospitalised and 528 non-hospitalised cases at regional level. Using GWAS and Hierarchical Lasso analysis, we detected mutations associated with hospitalization status in three different datasets: regional, national and worldwide, with little overlap between them. Although few variants replicated across cohorts, the overlap between the Spain and global analyses was statistically enriched and centred on RBD substitutions (D405N, K417N, R408S). Multiple integration of genomic association results prioritised 34/191 wastewater mutations (16 in Spike), including one mutation only detected in wastewater missed by routine clinical surveillance. Wastewater sequencing tracked dominant Omicron waves but performance varied by catchment; integrating clinical association results with interaction network modelling helped prioritise and interpret wastewater-detected mutations.

16
Spatial imprints of emergent cardiomyocyte states in the pressure-overloaded heart

Liu, Y.; Coles, A. M.; Castiglione, J.; Venu Thiyagarajan, V.; Clifton, K.; Goyal, D.; Wu, J.; Sheridan, A.; Vujic, A.; Harris, K. M.; Manor, U.; Pereira, T. D.; Fan, J.; Lee, R. T.; Kosuri, P.

2026-05-08 genomics 10.64898/2026.05.04.721738 medRxiv
Top 0.1%
4.9%
Show abstract

Resilience to cardiac stress is essential for health, yet the relationship between cardiomyocyte (CM) stress response and local microenvironment remains unclear. Here, we combined MERFISH spatial transcriptome profiling with Cellouette, an improved cell segmentation method, to determine CM-microenvironment relationships in a mouse model of ventricular pressure overload. We report the shape, transcription profile, spatial organization, and physical connectivity for >400,000 cells across stressed and healthy tissues. Under stress, CMs adopted a spectrum of emergent transcriptional states, with advanced states marked by a metabolic and pro-fibrotic shift. To discover CM-environment relationships, we performed a network analysis of physical cell connectivity combined with cell-type-specific profiling. We found that pro-fibrotic CM progression was tightly linked to distinct local microenvironments, and CM metabolic shifts could be inferred from transcriptional patterns in neighboring non-CM cells, revealing microenvironmental imprints of disease. We thus provide a resource for understanding the heterogeneity of outcome during cardiac pressure overload. HighlightsO_LICellouette provides accurate segmentation for single-cell spatial transcriptomics in cardiac tissue. C_LIO_LIPressure overload creates spatial gradients of cardiomyocyte pro-fibrotic states. C_LIO_LICardiomyocyte pro-fibrotic progression is linked to changes in local cell composition and gene expression. C_LIO_LITranscriptional states of non-muscle cells predict metabolic state of adjacent cardiomyocytes. C_LI

17
Global quantification of mammalian gene expression noise

Welter, A. S.; Mutschler, F.; Simon, M.; Giacomelli, C.; Branscheid, A.-C.; Manukyan, A.; Teixeira Alves, L. G.; Gerwien, M.; Kerridge, R.; Landthaler, M.; Wolf, J.; Selbach, M.

2026-05-14 systems biology 10.64898/2026.05.11.724258 medRxiv
Top 0.1%
4.8%
Show abstract

Even cells of the same type growing in the same environment show cell-to-cell differences in protein abundance, a phenomenon known as gene expression noise. This variability can be decomposed into intrinsic components, reflecting molecular randomness, and extrinsic components, arising from differences in cellular state. While gene expression noise has been studied genome-wide in microbes, its global organization remains largely unknown in mammalian cells. Here, we develop a spike-in-based stable isotope single-cell proteomics approach that enables robust quantification of protein-level gene expression noise across thousands of human proteins. We find that protein noise scales inversely with abundance until reaching a plateau, consistent with an extrinsic noise floor and conserved scaling principles observed in bacteria and yeast. Cell cycle stage and cell size contribute substantially to protein variability but do not fully account for the observed heterogeneity. Gene-specific features such as mRNA and protein half-lives and translation efficiency show only weak associations with protein noise, and variability at the mRNA level is a weak predictor of protein variability. Instead, protein noise is largely extrinsic, with coordinated variation across proteins encoding biologically organized cellular states. Consistently, coordinated proteome programs predict intercellular differences in proteome dynamics, linking protein variability to cellular function. Together, these results provide a proteome-wide view of gene expression noise in mammalian cells, establishing that protein-level variability encodes structured and functionally relevant differences in cellular state.

18
Temporally Phenotyping GLP-1RA Case Reports with Large Language Models: A Textual Time Series Corpus and Risk Modeling

Kumar, S.; Weiss, J.

2026-04-06 endocrinology 10.64898/2026.04.05.26350197 medRxiv
Top 0.1%
4.8%
Show abstract

Type 2 diabetes case reports describe complex clinical courses, but their timelines are often expressed in language that is difficult to reuse in longitudinal modeling. To address this gap, we developed a textual time-series corpus of 136 PubMed Open Access single-patient case reports involving glucagon-like peptide 1 receptor agonists, with clinical events associated with their most probable reference times. We evaluated automated LLM timeline extraction against gold-standard timelines annotated by clinical domain experts, assessing how well systems recovered clinical events and their timings. The best-performing LLM produced high event coverage (GPT5; 0.871) and reliable temporal sequencing across symptoms (GPT5; 0.843), diagnoses, treatments, laboratory tests, and outcomes. As a downstream demonstration, time-to-event analyses in diabetes suggested lower risk of respiratory sequelae among GLP-1 users versus non-users (HR=0.259, p<0.05), consistent with prior reports of improved respiratory outcomes. Temporal annotations and code will be released upon acceptance.

19
The Limits of Cross-Species WGCNA: Library Imbalance and Signal Dilution Constrain Effector Gene Recovery in Dual-Organism RNA-seq

Fenn, A.; Hueckelhoven, R.; Kamal, N.

2026-05-05 systems biology 10.64898/2026.04.30.721941 medRxiv
Top 0.1%
4.8%
Show abstract

Dual-organism RNA sequencing (RNA-seq) experiments, in which the transcriptomes of a host and a microbe are sequenced simultaneously, are increasingly used to study plant-microbe interactions. A central analytical goal is identifying effector proteins and their host targets through gene co-expression. Weighted Gene Co-expression Network Analysis (WGCNA) is the dominant tool for gene co-expression analyses, yet its ability to recover interaction-interface genes from a merged dual-organism matrix has not been systematically characterised. Here we present a simulation framework using real gene models from Hordeum vulgare (barley) and Blumeria graminis f. sp. Hordei M.Liu & Hambl (powdery mildew) to evaluate single-network WGCNA across a gradient of plant-to-fungal library size ratios (1:1-20:1), three levels of co-expression signal strength, and three WGCNA network construction types (signed, unsigned, signed hybrid). We embed 20 model effector genes (bridge genes) driven by a mixed host-pathogen eigengene and evaluate recovery using four metrics aligned with the biological objective: cross-species hub rank, top-decile hub enrichment, bridge gene detection rate, and bridge co-separation (the fraction of effector-target pairs co-assigned to the same detected module). Across 225 simulation runs (15 conditions x 5 replicates x 3 network types), bridge genes are robustly identifiable as cross-species connectivity hubs (mean rank 0.92 versus 0.50 for module genes) but co-assignment of effector-target pairs to the same module fails in 41% of runs due to scale-free topology collapse. Signal strength (2 = 0.12) and library ratio (2 = 0.22) are the primary determinants of co-separation, while network type choice accounts for less than 2%. A read-depth bias systematically inflates pathogen gene hub ranks relative to host genes at high ratios. These results establish that the method can identify effector candidates as cross-species hubs under a broad range of conditions, but reliable co-assignment requires adequate pathogen read depth and strong co-expression signal--properties that experimental design, not analytical parameterisation, must provide.

20
PEXMap: A proteogenomic method for exon and isoform level mapping of mass spectrometry derived peptides

Awasthi, D.; Verma, P.; Pandit, S. B.

2026-05-04 systems biology 10.64898/2026.04.29.721330 medRxiv
Top 0.1%
4.7%
Show abstract

Alternative splicing (AS) expands transcriptome and proteome diversity by differentially combining exons or their splice variants. Although RNA-seq studies have uncovered transcriptomic variability, understanding the corresponding protein-level diversity remains limited. Mass spectrometry-based proteomics provides protein-level insights through MS/MS peptide annotations, which are mostly linked to gene/transcript or UniProt identifiers. However, tracing them to specific isoforms remains challenging due to the lack of exon mapping or inconsistent annotations. We developed PEXMap (PeptideEXonMapper), a k-mer-based proteogenomic framework that systematically maps MS/MS peptides to genes, transcripts, exons, or exon-exon junctions by exact matching of unique 8-mers derived from MS/MS peptides to those in reference databases from exon-resolved isoforms. Comparing PEXMap mappings of human proteome from PeptideAtlas showed annotation concordance with it. Applying PEXMap to liver and pancreas proteomes, we identified tissue-specific isoform expression and, similarly, annotated the cancer proteome. PEXMap reliable mappings could provide insights into role of AS in shaping proteomes across tissues and disease states. Source code is publicly available for download at GitHub: https://github.com/deepanshicbg/PEXMap and supported on Linux.